NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Beyond Data Points: Regionalizing Crowdsourced Latency Measurements

https://doi.org/10.1145/3700416

Sharma, Taveesh; Schmitt, Paul; Bronzino, Francesco; Feamster, Nick; Marwell, Nicole P (December 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries, such as zip codes, census tracts, or neighborhood units. However, this assumption may not be valid for two reasons: (1) crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and (2) predefined social boundaries may not align with the actual boundaries of Internet infrastructure. In this paper, we present a spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance. We hypothesize that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations. We apply and evaluate a series of statistical techniques to: (1) aggregate Internet performance over geographic regions; (2) overlay interpolated maps with various sampling unit choices; and (3) spatially cluster boundary units to identify contiguous areas with similar performance characteristics. We assess the effectiveness of the techniques we apply by comparing the similarity of the resulting boundaries for monthly samples drawn from the dataset. Our evaluation shows that the combination of techniques we apply achieves higher similarity compared to directly calculating central measures of network metrics over census tracts or neighborhood boundaries. These findings underscore the important role of spatial modeling in accurately assessing and optimizing the distribution of Internet performance, which can better inform policy, network operations, and long-term planning decisions.
more » « less
Full Text Available
Generative, High-Fidelity Network Traces

https://doi.org/10.1145/3626111.3628196

Jiang, Xi; Liu, Shinan; Gember-Jacobson, Aaron; Schmitt, Paul; Bronzino, Francesco; Feamster, Nick (November 2023, ACM)

Recently, much attention has been devoted to the development of generative network traces and their potential use in supplementing real-world data for a variety of data-driven networking tasks. Yet, the utility of existing synthetic traffic approaches are limited by their low fidelity: low feature granularity, insufficient adherence to task constraints, and subpar class coverage. As effective network tasks are increasingly reliant on raw packet captures, we advocate for a paradigm shift from coarse-grained to fine-grained traffic generation compliant to constraints. We explore this path employing controllable diffusion-based methods. Our preliminary results suggest its effectiveness in generating realistic and fine-grained network traces that mirror the complexity and variety of real network traffic required for accurate service recognition. We further outline the challenges and opportunities of this approach, and discuss a research agenda towards text-to-traffic synthesis.
more » « less
Full Text Available
LEAF: Navigating Concept Drift in Cellular Networks

https://doi.org/10.1145/3609422

Liu, Shinan; Bronzino, Francesco; Schmitt, Paul; Bhagoji, Arjun Nitin; Feamster, Nick; Crespo, Hector Garcia; Coyle, Timothy; Ward, Brian (September 2023, Proceedings of the ACM on Networking)

Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target to be predicted changes. Mitigating concept drift is an essential part of operationalizing machine learning models in general, but is of particular importance in networking's highly dynamic deployment environments. In this paper, we first characterize concept drift in a large cellular network for a major metropolitan area in the United States. We find that concept drift occurs across many important key performance indicators (KPIs), independently of the model, training set size, and time interval---thus necessitating practical approaches to detect, explain, and mitigate it. We then show that frequent model retraining with newly available data is not sufficient to mitigate concept drift, and can even degrade model accuracy further. Finally, we develop a new methodology for concept drift mitigation, Local Error Approximation of Features (LEAF). LEAF works by detecting drift; explaining the features and time intervals that contribute the most to drift; and mitigates it using forgetting and over-sampling. We evaluate LEAF against industry-standard mitigation approaches (notably, periodic retraining) with more than four years of cellular KPI data. Our initial tests with a major cellular provider in the US show that LEAF consistently outperforms periodic and triggered retraining on complex, real-world data while reducing costly retraining operations.
more » « less
Full Text Available
The decoupling principle: a practical privacy framework

https://doi.org/10.1145/3563766.3564112

Schmitt, Paul; Iyengar, Jana; Wood, Christopher; Raghavan, Barath (November 2022, ACM HotNets 2022)

Full Text Available
New Directions in Automated Traffic Analysis

https://doi.org/10.1145/3460120.3484758

Holland, Jordan; Schmitt, Paul; Feamster, Nick; Mittal, Prateek (November 2021, ACM Conference on Computer and Communication Security (CCS))

Full Text Available
Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

https://doi.org/10.1145/3491052

Bronzino, Francesco; Schmitt, Paul; Ayoubi, Sara; Kim, Hyojoon; Teixeira, Renata; Feamster, Nick (December 2021, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10~Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.
more » « less
Mapping the Digital Divide: Before, During, and After COVID-19

https://doi.org/10.2139/ssrn.3786158

Bronzino, Francesco; Feamster, Nick; Liu, Shinan; Saxon, James; Schmitt, Paul (January 2021, SSRN Electronic Journal)
null (Ed.)
The digital divide—and, in particular, the homework gap— have been exacerbated by the COVID-19 pandemic, laying bare not only the inequities in broadband Internet access but also how these inequities ultimately affect citizens’ ability to learn, work, and play. Addressing these inequities ultimately requires having holistic, “full stack” data on the nature of the gaps in infrastructure and uptake—from the physical infrastructure (e.g., fiber, cable) to speed and application performance to affordability and neighborhood effects that ultimately affect whether a technology is adopted. This paper surveys how various existing datasets can (and cannot) shed light on these gaps, the limitations of these datasets, what we know from existing data about how the Internet responded to shifts in traffic during COVID-19, and—importantly for the future—what data we need to better understand these problems moving forward and how the research community, policymakers, and the public might gain access to various data. Keywords: digital divide,iInternet, mapping, performance
more » « less
Full Text Available
Interconnection Changes in the United States

Bronzino, Francesco; Cully, Elizabeth; Feamster, Nick; Liu, Shinan; Livingood, Jason; Schmitt, Paul (January 2021, IAB COVID-19 Workshop)
null (Ed.)
During the early weeks and months of the COVID-19 pandemic, significant changes to Internet usage occurred as a result of a sudden global shift to people working, studying and quarantining at home. One aspect that this affected was interconnection between networks, which this paper studies. This paper explores some of the effects of these changes on Internet interconnection points, in terms of utilization, traffic ratios, and other performance characteristics such as latency.
more » « less
Full Text Available
Comparing the Effects of DNS, DoT, and DoH on Web Performance

https://doi.org/10.1145/3366423.3380139

Hounsel, Austin; Borgolte, Kevin; Schmitt, Paul; Holland, Jordan; Feamster, Nick (April 2020, The Web Conference (WWW))
null (Ed.)
Full Text Available
Inferring Streaming Video Quality from Encrypted Traffic: Practical Models and Deployment Experience

https://doi.org/10.1145/3366704

Bronzino, Francesco; Schmitt, Paul; Ayoubi, Sara; Martins, Guilherme; Teixeira, Renata; Feamster, Nick (December 2019, Proceedings of the ACM on Measurement and Analysis of Computing Systems)
null (Ed.)
Inferring the quality of streaming video applications is important for Internet service providers, but the fact that most video streams are encrypted makes it difficult to do so. We develop models that infer quality metrics (\ie, startup delay and resolution) for encrypted streaming video services. Our paper builds on previous work, but extends it in several ways. First, the models work in deployment settings where the video sessions and segments must be identified from a mix of traffic and the time precision of the collected traffic statistics is more coarse (\eg, due to aggregation). Second, we develop a single composite model that works for a range of different services (\ie, Netflix, YouTube, Amazon, and Twitch), as opposed to just a single service. Third, unlike many previous models, our models perform predictions at finer granularity (\eg, the precise startup delay instead of just detecting short versus long delays) allowing to draw better conclusions on the ongoing streaming quality. Fourth, we demonstrate the models are practical through a 16-month deployment in 66 homes and provide new insights about the relationships between Internet "speed'' and the quality of the corresponding video streams, for a variety of services; we find that higher speeds provide only minimal improvements to startup delay and resolution.
more » « less
Full Text Available

« Prev Next »

Search for: All records